The objective of this
proposal is to provide a high level interface for programmers using lcc2 for
accessing all new MMX instructions.
The MMX instruction
set is accessible through intrinsic functions, that are recognized and inlined
by the compiler.
The data type used by
all MMX intrinsics is an 8 byte union, described in ‘mmx.h’. The interface is
designed to work at maximum speed when vectors of this datatype are used. The
internal loop necessary to apply the given operation to all elements of the
data vectors is generated in-line. The dimensions of both arrays should be
identical.
Scalar extension is
provided, i.e. one of the inputs to the MMX intrinsics can be a scalar, that
will be automatically extended by the compiler to apply the mmx operation to
all elements of the input vector.
Since the MMX
instructions and floating point instructions are incompatible, it is assumed
that a function does not mix floating point and mmx. An emms instruction will
be issued in the function epilogue if the mmx instruction set is used.
Obviously, the
assembler interface is still available, and assembler instructions can be used
direcly. In this case, it is the programmer’s responsability to issue the
‘emms’ instruction.
Instructions vary by:
·
Data type: packed
bytes, packed words, packed doublewords or quadwords
·
Signed - Unsigned
numbers
·
Wraparound -
Saturate arithmetic
·
Scalar/Vector
data
A typical MMX
instruction has this syntax:
·
Prefix:
·
‘_’ to indicate
that this is a compiler reserved word.
·
‘p’ for Packed,
as Intel suggests.[1]
·
Instruction
operation: for example - ADD, CMP, or XOR
·
Suffix:
·
US for Unsigned
Saturation
·
S for Signed
saturation
·
B, W, D, Q for
the data type: packed byte, packed word, packed doubleword, or quadword.
·
‘i’ for
‘immediate’ (scalar) data. If this suffix is not present, the function operates
over two arrays.
The pack operation
operates with words (packed to bytes) or with dwords (packed to words).
void _stdcall _packsswb(_mmxdata *array1,_mmxdata
*array2,int n);
Description
Each element of array1
will be packed with the corresponding element of array2. The result is written to
array1. The number of elements of both arrays is given by ‘n’.
Mode of operation:
while (n-- > 0) {
array1[n](7..0) =
SaturateSignedWordToSignedByte array1[n](15..0);
array1[n](15..8) =
SaturateSignedWordToSignedByte array1[n](31..16);
array1[n](23..16) = SaturateSignedWordToSignedByte array1[n](47..32);
array1[n](31..24) = SaturateSignedWordToSignedByte array1[n](63..48);
array1[n](39..32) = SaturateSignedWordToSignedByte array2[n](15..0);
array1[n](47..40) = SaturateSignedWordToSignedByte array2[n](31..16);
array1[n](55..48) = SaturateSignedWordToSignedByte array2[n](47..32);
array1[n](63..56) = SaturateSignedWordToSignedByte array2[n](63..48);
}
void _stdcall _packsswbi(_mmxdata *array,_mmxdata
*imm,int n);
Description
Each element of array1
will be packed with imm. The result is written to array1. The number of
elements of array1 is given by ‘n’.
Mode of operation:
while (n-- > 0) {
array[n](7..0) =
SaturateSignedWordToSignedByte array[n](15..0);
array[n](15..8) =
SaturateSignedWordToSignedByte array[n](31..16);
array[n](23..16) = SaturateSignedWordToSignedByte array[n](47..32);
array[n](31..24) = SaturateSignedWordToSignedByte array[n](63..48);
array[n](39..32) = SaturateSignedWordToSignedByte imm(15..0);
array[n](47..40) = SaturateSignedWordToSignedByte imm[n](31..16);
array[n](55..48) = SaturateSignedWordToSignedByte imm[n](47..32);
array[n](63..56) = SaturateSignedWordToSignedByte imm[n](63..48);
}
void _stdcall _packssdw(_mmxdata *array1,_mmxdata
*array2,int n);
Description
Each element of array1
will be packed with the corresponding element of array2. The result is written
to array1. The number of elements of both arrays is given by ‘n’.
Mode of operation:
while (n-- > 0) {
array1[n](15..0) =
SaturateSignedDwordToSignedWord array1[n](31..0);
array1[n](31..16) = SaturateSignedDwordToSignedWord array1[n](63..32);
array1[n](47..32) = SaturateSignedDwordToSignedWord array2[n](31..0);
array1[n](63..48) = SaturateSignedDwordToSignedWord array2[n](63..32);
}
void _stdcall _packssdwi(_mmxdata *array,_mmxdata
*imm,int n);
Description
Each element of array1
will be packed with the corresponding element of array2. The result is written
to array1. The number of elements of both arrays is given by ‘n’.
Mode of operation:
while (n-- > 0) {
array1[n](15..0) =
SaturateSignedDwordToSignedWord array1[n](31..0);
array1[n](31..16) = SaturateSignedDwordToSignedWord array1[n](63..32);
array1[n](47..32) = SaturateSignedDwordToSignedWord imm(31..0);
array1[n](63..48) = SaturateSignedDwordToSignedWord imm(63..32);
}
void _stdcall _packuswb(_mmxdata *array1,_mmxdata
*array2,int n);
Description
Each element of array1
will be packed with the corresponding element of array2. The result is written
to array1. The number of elements of both arrays is given by ‘n’.
Mode of operation:
while (n-- > 0) {
array1[n](7..0) =
SaturateSignedWordToUnsignedByte array1[n](15..0);
array1[n](15..8) =
SaturateSignedWordToUnsignedByte array1[n](31..15);
array1[n](23..16) = SaturateSignedWordToUnsignedByte array1[n](47..32);
array1[n](31..24) = SaturateSignedWordToUnsignedByte array1[n](63..48);
array1[n](39..32) = SaturateSignedWordToUnsignedByte array2[n](15..0);
array1[n](47..40) = SaturateSignedWordToUnsignedByte array2[n](31..16);
array1[n](55..48) = SaturateSignedWordToUnsignedByte array2[n](47..32);
array1[n](63..56) = SaturateSignedWordToUnsignedByte array2[n](63..48);
}
void _stdcall _packuswbi(_mmxdata *array,_mmxdata
*imm,int n);
Description
Each element of array1
will be packed with imm. The result is written to array1. The number of
elements of array is given by ‘n’.
Mode of operation:
while (n-- > 0) {
array[n](7..0) =
SaturateSignedWordToUnsignedByte array[n](15..0);
array[n](15..8) =
SaturateSignedWordToUnsignedByte array[n](31..15);
array[n](23..16) = SaturateSignedWordToUnsignedByte array[n](47..32);
array[n](31..24) = SaturateSignedWordToUnsignedByte array[n](63..48);
array[n](39..32) = SaturateSignedWordToUnsignedByte imm[n](15..0);
array[n](47..40) = SaturateSignedWordToUnsignedByte imm[n](31..16);
array[n](55..48) = SaturateSignedWordToUnsignedByte imm[n](47..32);
array[n](63..56) = SaturateSignedWordToUnsignedByte imm[n](63..48);
}
Packed add byte
void _stdcall _paddb(_mmxdata *array1,_mmxdata
*array2,int n);
Description
Each element of array1
will be added with each corresponding element of array2. The result is written
to array1. The number of elements of both arrays is given by ‘n’.
Mode of operation:
while (n-- > 0) {
array1[n](7..0) =
array1[n](7..0) + array2[n](7..0);
array1[n](15..8) =
array1[n](15..8) + array2[n](15..8);
array1[n](23..16) = array1[n](23..16)+ array2[n](23..16);
array1[n](31..24) = array1[n](31..24) + array2[n](31..24);
array1[n](39..32) = array1[n](39..32) + array2[n](39..32);
array1[n](47..40) = array1[n](47..40)+ array2[n](47..40);
array1[n](55..48) = array1[n](55..48) + array2[n](55..48);
array1[n](63..56) = array1[n](63..56) + array2[n](63..56);
}
void _stdcall _paddbi(_mmxdata *array1,_mmxdata
*imm,int n);
Description
Each element of array1
will be added with imm. The result is written to array1. The number of elements
of array is given by ‘n’.
Mode of operation:
while (n-- > 0) {
array[n](7..0) = array[n](7..0) + imm[n](7..0);
array[n](15..8) =
array[n](15..8) + imm[n](15..8);
array[n](23..16) = array[n](23..16) + imm[n](23..16);
array[n](31..24) = array[n](31..24) + imm[n](31..24);
array[n](39..32) = array[n](39..32) + imm[n](39..32);
array[n](47..40) = array[n](47..40) + imm[n](47..40);
array[n](55..48) = array[n](55..48) + imm[n](55..48);
array[n](63..56) = array[n](63..56) + imm[n](63..56);
}
Packed add word
void _stdcall _paddw(_mmxdata *array1,_mmxdata
*array2,int n);
Description
Each element of array1
will be added with each corresponding element of array2. The result is written
to array1. The number of elements of both arrays is given by ‘n’.
Mode of operation:
while (n-- > 0) {
array1[n](15..0)<--array1[n](15..0) + array2[n](15..0);
array1[n](31..16)<--array1[n](31..16) + array2[n](31..16);
array1[n](47..32)<--array1[n](47..32) + array2[n](47..32);
array1[n](63..48)<--array1[n](63..48) + array2[n](63..48);
}
void _stdcall _paddwi(_mmxdata *array1,_mmxdata
*imm,int n);
Description
Each element of array1
will be added with imm. The result is written to array1. The number of elements
of array is given by ‘n’.
Mode of operation:
while (n-- > 0) {
array[n](15..0)<--array[n](15..0) + imm(15..0);
array[n](31..16)<--array[n](31..16) + imm(31..16);
array[n](47..32)<--array[n](47..32) + imm(47..32);
array[n](63..48)<--array[n](63..48) + imm[n](63..48);
}
Packed add double word
void _stdcall _paddd(_mmxdata *array1,_mmxdata
*array2,int n);
Description
Each element of array1
will be added with each corresponding element of array2. The result is written
to array1. The number of elements of both arrays is given by ‘n’.
Mode of operation:
while (n-- > 0) {
array1[n](31..0)<--array1[n](31..0) + array1[n](31..0);
array1[n](63..32)<--array1[n](63..32) + array1[n](63..32);
}
void _stdcall _padddi(_mmxdata *array,_mmxdata
*imm,int n);
Each element of array1
will be added with imm. The result is written to array1. The number of elements
of array is given by ‘n’.
Mode of operation:
while (n-- > 0) {
array[n](31..0)<--array1[n](31..0) + imm(31..0);
array[n](63..32)<--array1[n](63..32) + imm(63..32);
}
Packed add byte with saturation
a) Signed variants
void _stdcall _paddsb(_mmxdata *array1,_mmxdata
*array2,int n);
void _stdcall _paddsbi(_mmxdata *array1,_mmxdata
*array2,int n);
b) Unsigned variant
void _stdcall _paddusb(_mmxdata *array1,_mmxdata
*array2,int n);
void _stdcall _paddusbi(_mmxdata *array1,_mmxdata
*array2,int n);
Description
Each element of array1
will be added with each corresponding element of array2. The result is written
to array1. The number of elements of both arrays is given by ‘n’.
For the signed
operation, if the result of the add is saturated to 0x7f or to 0x80 in case of
overflow/underflow respectively.
For the unsigned
operation, the saturation values are 0xFF and 0x00 in case of
overflow/underflow.
Packed add word with saturation
void _stdcall _paddsw(_mmxdata *array1,_mmxdata
*array2,int n);
void _stdcall _paddswi(_mmxdata *array1,_mmxdata
*imm,int n);
Description
Same operation as in
paddsb above. The saturation values are 0x7FFF and 0x8000 for the signed operation,
and 0xFFFF and 0x00 for signed / unsigned operations.
void _stdcall _pand(_mmxdata *array1,_mmxdata
*array2,int n);
void _stdcall _pandi(_mmxdata *array1,_mmxdata
*imm,int n);
The bitwise logical
AND operation is done between each 64 bit element of the arrays. The result is
written to the array1.
void _stdcall _pandn(_mmxdata *array1,_mmxdata
*array2,int n);
void _stdcall _pandni(_mmxdata *array1,_mmxdata
*imm,int n);
First a bitwise logical
NOT on the 64 bits of each element is performed, inverting each bit of the
source operand(array2). Then,the bitwise logical AND operation is done between
each 64 bit element of the arrays. The result is written to the array1.
void _stdcall _replicatebyte(_mmxdata *dst,unsigned
char c);
void _stdcall _replicateword(_mmxdata *dst,unsigned
short w);
void _stdcall _replicatedword(_mmxdata *dst,unsigned
int i);
This instructions
replcate either a byte, a word or a double word into the mmx data pointed to by
the ‘dst’ argument. Its use is essentially meant for comparisons.
The first 64 bits of
the first argument will be filled with the given integer, either as bytes,
words, or double words.
Example:
_replicatebyte(&mmxdata,’ ‘);
Then, mmxdata will
contain 8 spaces, and can be later used as an argument for comparison
functions.
int _stdcall _reduceBooleanb(_mmxdata *map,int n);
int _stdcall _reduceCmpeqb(_mmxdata *map,_mmxdata
*imm,int n);
int _stdcall _reduceGtb(_mmxdata *map,_mmxdata
*imm,int n);
int _stdcall _reduceLtb(_mmxdata *map,_mmxdata
*imm,int n);
This instructions add
a boolean vector counting the non zero members and return a 32 bit integer with
the result.
_reduceBooleanb, sums
all true bytes (11111111) in a logical
vector that is the result of a previous comparison.
_reduceCmpeqb makes a
comparisons and then adds the hits
_reduceLtb and
_reduceGtb test for Greater than or less than, and add up the ‘true’ bytes.
‘True’ bytes are those
set to all ones (11111111b, or 0xFFH or 255 decimal) by a previous mmx logical operation.
Example:
If the mm data element
space contains a set of 8 space bytes (32), the following will count the number
of spaces in the character vector ‘data’:
_reduceCmpeqb(data,&space,len/8);